Automated compound structure characterization in overhead imagery

ABSTRACT

A system for automatically characterizing areas of interest (e.g., urban areas, forests, and/or other compound structures) in high resolution overhead imagery through manipulation of a dictionary of visual words. The pixels of an input overhead image are initially clustered into a plurality of hierarchically-arranged connected components of a first hierarchical data structure. Image descriptors (e.g., shape, spectral, etc.) of the connected components are then clustered into a plurality of hierarchically-arranged nodes of a second hierarchical data structure. The nodes at a particular level in the second hierarchical data structure become a dictionary of visual words. Subsets of the visual words may be used to label the cells of a grid over the geographic area as falling into one of a number of areas of interest. Categorization information from the grid may be mapped into a resultant image whereby pixels depict their respective type of area of interest.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a continuation of U.S. patent application Ser. No.14/163,008, filed on Jan. 24, 2014, entitled “AUTOMATED COMPOUNDSTRUCTURE CHARACTERIZATION IN OVERHEAD IMAGERY”, which claims priorityto U.S. Provisional Patent App. No. 61/911,073, entitled “AUTOMATEDCOMPOUND STRUCTURE CHARACTERIZATION IN OVERHEAD IMAGERY,” and filed onDec. 3, 2013, the entire contents of which is incorporated herein as ifset forth in full.

BACKGROUND

The use of geospatial or overhead imagery has increased in recent yearsand represents a significant tool that may be utilized in a number ofcontexts. As such, high quality geospatial imagery has becomeincreasingly valuable. For example, a variety of different entities(e.g., individuals, governments, corporations, or others) may utilizegeospatial imagery (e.g., satellite imagery) for a multitude ofapplications in a variety of contexts.

With increasingly capable satellites being commissioned and launched,very high resolution (VHR) remotely-sensed multispectral Earth imageryhas become increasingly available and useable. For example, as thenumber of satellite image acquisition systems in operation grows,acquisition ability and flexibility improves. In an example,DigitalGlobe, Inc. of Longmont, Colo. currently operates a number ofsatellites including, IKONOS, GeoEye-1, QuickBird, WorldView 1, andWorldView 2, with an anticipated launch of WorldView 3. Accordingly,around the clock global coverage may be achieved through the satelliteconstellation currently in operation. For instance, the DigitalGlobeconstellation of satellites can image the entire Earth's landmass every75 days and may capture over six times the Earth's landmass every yearwith a capacity to collect at least 2.6 million square kilometers ofimagery a day. With selective tasking, DigitalGlobe's satelliteconstellation may quickly and continually collect imagery from targetedlocations to provide near real time feedback on global events or thelike.

Furthermore, the resolution of image acquisition satellites alsocontinues to increase. For instance, currently operated satellites mayacquire images with a maximum spatial resolution of 50 cm (wherein eachpixel in the images corresponds with the distance measure of the spatialresolution). Additionally, planned satellite launches may provide evengreater resolution capabilities with spatial resolutions as high asabout 30 cm or greater (i.e., less than 30 cm, such as 25 cm, 15 cm, orlower).

In this light, the amount and quality of VHR remotely-sensedmultispectral Earth imagery continues to increase as does the amount andtypes of image data collected. Accordingly, the nature of the VHRremotely-sensed multispectral Earth imagery may facilitate uses beyondsimply providing pixels as image data. For instance, higher level dataprocessing may be applied to the images to, for example, identifyobjects, identify textures, or otherwise extract useful data from theraw image data. One such application has been in land use identificationand land classification, where remotely-sensed images are analyzed tocategorize pixels of an image into a category of land use or land class.As just one example, image pixels can be categorized or analyzed toidentify and/or characterize areas of the urban settlement (e.g., theurban “build-up” or “built-up,” such as three-dimensional man-madestructures or the like).

As the amount of image data that is available grows and the nature ofthe image data acquired changes and is improved, advanced image dataprocessing and image analytics are needed to keep pace with the advancesin image acquisition technology to facilitate new techniques applied toacquired images to expand the number of applications for which suchtechnology may be used.

SUMMARY

Broadly, disclosed herein are utilities (e.g., systems, processes, etc.)for automatically characterizing compound structures in high resolutionoverhead imagery by treating the overhead imagery as a collection ofspatially unordered “visual words” and then analyzing the frequencies(e.g., prevalences) of one or more of the visual words within each cellof a grid of cells over the geographic area to categorize an overallcompound structure of the cells. As used herein, a “compound structure”is a composition of image structures representing a concept on theground such as built-up, forests, orchards, etc. in portions of a sceneor geographic area. As an example, a first subset of all of the visualwords may be used to categorize the cells of a first grid as fallingwithin or not within a first compound structure type and a second subsetof all of the visual words may be used to categorize the cells of asecond grid as falling within or not within a second compound structuretype, wherein one or more parameters (e.g., width, overlap) of the cellsof the first and second grids are different. Each of the first andsecond subsets of visual words and corresponding connected components(groups of pixels) of the overhead imagery described by each of thefirst and second subsets of visual words may be loaded from storage intomemory for fast access thereof and corresponding categorizationdecisions. One or more of the grids may be mapped into a resultant imageof the geographic area whereby each cell depicts (e.g., by color,shading, texturing, etc.) its respective type of compound structure.

Initially, one or more input overhead images (e.g., multispectral) of aparticular geographic area may be appropriately decomposed or otherwisebroken down into any appropriate image space representation (e.g.,space-partitioning data structures for organizing data points, such as amulti-scale segmentation) that automatically groups or clusters pixelsof the overhead image(s) into a plurality of connected segments orcomponents that represent “dominant” structures or portions of theimage(s) (where the total number of connected components is far lessthan the total number of pixels in the image(s), such as 10% or less).For example, the input overhead image(s) may be decomposed, by at leastone processor of any appropriate computing device(s), into a pluralityof connected components of a first hierarchical data structure such as aMin-Tree, a Max-Tree, and/or the like. In the case of a Max-Tree, thehierarchical data structure may be a rooted, uni-directed tree with itsleaves corresponding to a regional maxima of the input image(s) and itsroot corresponding to a single connected component defining thebackground of the input image(s). For instance, the image(s) may bethresholded at each grey level to provide as many binary images as thenumber of grey levels, where each binary image may be analyzed to deriveits connected components.

One or more image descriptors or feature elements (e.g., spectral,geometrical, shape, morphological, etc.) may be derived or otherwisedetermined (by the processor(s)) from or for each of the connectedcomponents that convey important information embedded in the overheadimage(s). For instance, the joint usage of spectral and shapedescriptors may allow for the differentiation between similar materialsthat are spatially arranged in different manners. The derived orextracted image descriptors may then be clustered into a plurality ofnodes of a second hierarchical data structure that collectivelyrepresents the feature or image descriptor space of the overheadimage(s). For instance, the nodes of the second hierarchical datastructure may be arranged into a plurality of leaf paths that eachextend from a single root node of the second hierarchical data structureto one of a plurality of leaf nodes, where one or more of the imagedescriptors depends from one of the plurality of leaf nodes. As onenon-limiting example, the general clustering technique may include aKD-Tree based space partitioning procedure that generates a KD-Treebased on the plurality of image descriptors.

Any “cut” in the second hierarchical data structure may then create adictionary of visual words that may be used to categorize connectedcomponents and thus each cell or tile of one or more grids overlaid ontothe geographic area as part of the automatic computation of the compoundstructure representation of the geographic area. As an overly simplisticexample, assume that a cut was made at a four-node level of the secondhierarchical data structure. In this case, the respective collection ofimage descriptors depending from each of the four nodes would representa visual word resulting in four visual words in this example. Forinstance, the four visual words in this example could be red circularstructures/portions, yellow circular structures/portions, redrectangular structures/portions, and yellow rectangularstructures/portions, respectively.

The visual words may then be used, by the processor(s), to categorizethe various cells of a grid overlaid onto the geographic area in anyappropriate manner. In one embodiment, the respective frequencies ofeach of the visual words within each cell may be determined and used tocharacterize the type of compound structure(s) found within the cell.For instance, the at least one image descriptor of each connectedcomponent present within the cell may be compared to those of each ofthe visual words. The connected component may be then classified as theparticular visual word whose at least one image descriptor is closest tothe at least one image descriptor of the connected component (e.g., asmeasured by a smallest distance between the at least one imagedescriptor of the component and the at least one image descriptor of theparticular visual word). A similar process may be performed with otherconnected components present (e.g., entirely) within the cell.

After each component is classified, the respective frequencies of thevisual words within the cell may be updated and then the overallfrequencies of the visual words within the cell may be used tocategorize the overall type of compound structure(s) of or found withinthe cell. For example, it may be known that urban areas typicallymanifest themselves as a first combination of visual word frequencies(e.g., 10% visual word #1, 59% visual word #2, and 31% visual word #3)while orchards typically manifest themselves as a different secondcombination of visual word frequencies (e.g., 0% visual word #1, 40%visual word #2, and 60% visual word #3). The particular collection ofvisual word frequencies of each cell (e.g., which may, in oneembodiment, be used to populate a plurality of corresponding entries ina histogram or vector of entries for the cell) may then be compared toknown visual word frequency combinations (e.g., similar to the manner inwhich connected components were classified as a particular one of thevisual words) to categorize the cell as a particular one of a pluralityof types of compound structures (e.g., urban areas, fields, orchards, orthe like). In one arrangement, a user may be able to manually select(e.g., on a user interface in any appropriate manner) or otherwiseprovide a number of positive and/or negative examples (e.g., trainingsamples) of each compound structure type for use in the categorizationprocess. In the event that a user provides a positive example of anorchard in the overhead image(s) of the particular geographic area, forinstance, the particular visual word frequencies within the selectedpositive area(s) may be automatically extracted and used in thecategorization process of other cells or portions of the geographic areain the manners disclosed herein.

A dictionary with a greater number of specific visual words may becreated by making the cut at a level closer to the leaf nodes while adictionary with a lower number of more general/broad visual words may becreated by making the cut at a level closer to the root node. Forinstance, a cut at the two-node level of the above example may result ina first visual word of circular structures/portions (all colors) and asecond visual word of rectangular structures/portions (all colors). Thevisual words may then be used to categorize compound structures of thegeographic area. Greater numbers of visual words may result in a morefine-grained categorization with reduced computational efficiency whilelower numbers of visual words may result in a more coarse-grainedcategorization with increased computational efficiency.

One or more particular parameters of the cell grids may be selected tofine-tune the cell grids to identify different types of compoundstructures, to increase the accuracy of the categorization process,and/or the like. In one embodiment, an overlap parameter may be selectedto increase the likelihood that connected components are entirely foundwithin at least one cell (e.g., such as by increasing the degree towhich adjacent cells in each grid overlap each other). For instance,increasing the overlap parameter may increase the resolution of thecategorization process by increasing the number of cells needed to coverthe geographic area, and vice versa. In another embodiment, a widthparameter of the cells of the grid may be selected to correspond to thetype of compound structure being detected in the overhead image(s) ofthe geographic area. For instance, the semantic or compound structure“destroyed building” may be best described by finding rubble pieces(e.g., identified by a particular combination of visual wordfrequencies) within a spatial extent which is about that of a buildingscale, such as 10 meters, while the semantic or compound structure“orchard” may be best described by finding trees within a spatial extentwhich has the dimension of a field greater than about 100 meters. Inthis case, a first grid could be created having cells with widthparameters of 10 meters to identify destroyed buildings and a secondgrid could be created having cells with width parameters of 100 metersto identify orchards. The first and second grids may both be overlaidonto the overhead image(s) of the geographic area to automatically andsimultaneously detect destroyed buildings and orchards within theoverhead image(s) of the geographic area as disclosed herein.

Various measures may be taken to increase the computational efficiencyof and/or otherwise facilitate the automatic characterization (e.g.,categorization) processes disclosed herein. In one arrangement, a subsetof visual words from the dictionary of visual words may be selected andused to categorize the cells over the geographic area. Stateddifferently, as opposed to determining the visual word frequencies ofall visual words in the dictionary for each cell, only the frequenciesof a smaller, select number of visual words may be determined, where thefrequency of the smaller, select number of visual words in a cell ishighly revealing or telling as to whether or not a particular compoundstructure type is present within the cell. In the case of a dictionaryhaving ten visual words, for instance, only three of the visual wordsmay be important in determining whether or not the cells in a geographicarea are or are not urban areas. That is, the presence or absence of theother seven visual words may have little or no bearing on whether or notthe cells include or represent urban areas. Use of a subset of allvisual words as disclosed herein may facilitate the categorizationprocess by limiting or avoiding determination of all visual wordfrequencies and the corresponding computation resources requiredtherefore.

Continuing with the above example, any connected components within thegeographic area classified by one of the three visual words may beidentified (e.g. a subset of all connected components). Additionally,any cells containing (e.g., fully) at least one of the connectedcomponents classified by one of the three visual words may be identified(e.g., a subset of all cells). At this point, the respective visual wordfrequencies in each cell in the subset of cells may be obtained and usedto determine whether or not such cells represent urban areas. Anyportions of cells not in the subset and not overlapped by cells that dorepresent urban areas may be presumed to not be urban areas.Furthermore, a similar process may be performed with other subsets ofvisual words and other grids of cells (e.g., having different widthparameters and/or overlap parameters) for other compound structure types(e.g., orchards, fields, etc.).

In one arrangement, respective weighting values may be assigned to eachof the visual words to convey a relative importance each visual word hasin determining whether or not a cell is to be categorized as aparticular compound structure type. Again continuing with the aboveexample, the presence or absence of the first of the three visual wordsin the subset may be the most indicative as to whether or not aparticular cell represents an urban area and thus may be assigned aweighting value of 60 out of 100. However, the presence or absence ofthe second and third of the three visual words in the subset may be lessindicative (than the first of the three visual words in the subset) asto whether or not a particular cell represents an urban area and thusmay each be assigned lower weighting values of 20 out of 100. In thecase of a cell having three connected components contained therein(where each connected component is classified as one of the three visualwords in the subset), the frequency of each respective visual word inthe cell may be determined (e.g., such as by determining a prevalence ofthe respective connected component within the cell) and multiplied byits respective weighting value and then the three products may be addedto obtain a sum. For instance, the cell may be categorized as an urbanarea if the sum meets or exceeds a particular threshold while the cellmay not be categorized as an urban area if the sum does not meet thethreshold (e.g., or vice versa).

To allow for rapid queries (e.g., cell categorizations) whileconsidering a limited subset of all of the visual words, an “inverted”file for storing the compound structure representation in a compact orcompressed form may be implemented. For instance, an initialclassification of all connected components of the first hierarchicaldata structure for the geographic area as one of the visual words may beperformed. Each connected component may then be indexed according to thevisual word by which the component is classified and maintained in anyappropriate storage device and/or location. For instance, visual word #1may include a corresponding list of connected components (e.g., whereeach connected component is identified by dimensions and/or coordinateswithin the overhead image(s) of the geographic area); visual word #2 mayinclude a corresponding list of connected components; and so on. Beforea particular compound structure type categorization process is to begin,only the connected component lists corresponding to particular visualwords needed to categorize the particular compound structure type may beappropriately extracted from the storage location and loaded into memory(e.g., volatile memory) to facilitate access to the visual words andcorresponding connected components for purposes of categorizing one ormore grids of cells.

Any of the embodiments, arrangements, or the like discussed herein maybe used (either alone or in combination with other embodiments,arrangement, or the like) with any of the disclosed aspects. Merelyintroducing a feature in accordance with commonly accepted antecedentbasis practice does not limit the corresponding feature to the singular.Any failure to use phrases such as “at least one” does not limit thecorresponding feature to the singular. Use of the phrases “at leastgenerally,” “at least partially,” “substantially” or the like inrelation to a particular feature encompass the correspondingcharacteristic and insubstantial variations thereof. Furthermore, areference of a feature in conjunction with the phrase “in oneembodiment” does not limit the use of the feature to a singleembodiment.

In addition to the exemplary aspects and embodiments described above,further aspects and embodiments will become apparent by reference to thedrawings and by study of the following descriptions.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a simplified block diagram illustrating a process of detectingor characterizing compound structures in overhead imagery data.

FIG. 2 is a more detailed block diagram of an automated system forcharacterizing compound structures in overhead imagery data.

FIG. 3 is a block diagram of a first hierarchical data structure fororganizing or clustering pixels of overhead image(s) into connectedcomponents.

FIG. 4 is a schematic diagram of a space partitioning procedure for usein hierarchically arranging image descriptors of the components of FIG.3 into a feature space.

FIG. 5 is a second hierarchical data structure built from thepartitioning procedure illustrated in FIG. 4.

FIG. 6 is a schematic diagram of a portion of a grid being overlaid ontoone or more overhead images of a geographic area and illustratingconnected components within the grid.

FIG. 7 is a simplified depiction of an inverted file for use in indexingconnected components according to a visual word by which the connectedcomponents are classified.

FIG. 8 is a flow diagram of a method for generating a dictionary ofvisual words that may be used to characterize compound structures inoverhead images of a geographic area.

FIG. 9 is a flow diagram of a method for characterizing compoundstructures in overhead images of a geographic area using a visualdictionary created according to the method of FIG. 8.

FIG. 10 is a flow diagram of a variation of the method of FIG. 9.

FIG. 11 is a flow diagram of a method for classifying connectedcomponents as one of a plurality of visual words in a dictionary.

FIG. 12 is a flow diagram of a method for identifying visual frequencieswithin training samples.

FIG. 13 illustrates a multispectral image acquired by the WorldView 2satellite over a portion of Rio de Janeiro, Brazil.

FIG. 14 illustrates training samples selected for use in detecting afirst compound structure type in the multispectral image of FIG. 13.

FIG. 15 illustrates training samples selected for use in detecting asecond compound structure type in the multispectral image of FIG. 13.

FIG. 16 illustrates training samples selected for use in detecting athird compound structure type in the multispectral image of FIG. 13.

FIG. 17a illustrates a close up view of a portion of the multispectralimage of FIG. 13.

FIG. 17b illustrates visual words detected in the illustration of FIG.17a that are used to detect the first compound structure type of FIG.14, where different visual words are differently colored, shaded, etc.

FIG. 18a illustrates a close up view of another portion of themultispectral image of FIG. 12.

FIG. 18b illustrates visual words detected in the illustration of FIG.18a that are used to detect the second compound structure type of FIG.15, where different visual words are differently colored, shaded, etc.

FIG. 19a illustrates a close up view of another portion of themultispectral image of FIG. 12.

FIG. 19b illustrates visual words detected in the illustration of FIG.19a that are used to detect the third compound structure type of FIG.16, where different visual words are differently colored, shaded, etc.

FIG. 20 is a resultant image or information layer corresponding to themultispectral image of FIG. 12 that depicts the three compound structuretypes of FIGS. 14-16.

DETAILED DESCRIPTION

Disclosed herein are utilities (e.g., systems, processes, etc.) forautomatically characterizing (e.g., categorizing) various types of“compound” structures (e.g., urban areas, residential areas, informalsettlements, orchard, fields, forests, etc.) in overhead imagery (e.g.,HR/VHR imagery) of geographic areas at various levels of granularity inmanners that limit or reduce use of computational resources. Thedisclosed utilities broadly treat the overhead imagery as a collectionof spatially unordered “visual words” and then analyze the frequencies(e.g., prevalences) of the visual words within each cell of a grid ofcells over the geographic area to categorize the cells as one of anumber of types of compound structures.

For instance, the pixels of one or more input overhead images of ageographic area may be broken down or decomposed into any appropriateimage space representation (e.g., first hierarchical data structure)that automatically groups pixels into connected components (e.g.,dominant segments). Thereafter, one or more images descriptors (e.g.,spectral, shape, etc.) of the various connected components may bedetermined (e.g., derived, obtained, ascertained) and appropriatelyorganized into a feature space such that automatically clusters theimage descriptors into a plurality of hierarchically-arranged levels ofnodes of a second hierarchical data structure. Any “cut” in the secondhierarchical data structure at a particular level may then create adictionary of visual words respectively represented by the imagedescriptors encompassed by the nodes at the particular level. One ormore such grids may be mapped into a resultant image of the geographicarea whereby each cell depicts (e.g., by color, shading, texturing,etc.) its respective type of compound structure and thus variouscompound structure types within the geographic area as a whole.Resultant images can be used in numerous contexts such as in assessingpopulation densities, qualities of life, vulnerability factors, disasterrisks, sufficiency of civil infrastructures, economic growth, povertylevels, event monitoring and evolution, and the like.

At the outset, it is noted that, when referring to the earth herein,reference is made to any celestial body of which it may be desirable toacquire images or other remote sensing information. Furthermore, whenreferring to “overhead” imagery herein, such imagery may be obtained byany spacecraft, satellite, aircraft, and/or the like capable ofacquiring images or other remote sensing information. Furthermore, theutilities described herein may also be applied to other imaging systems,including imaging systems located on the earth or in space that acquireimages of other celestial bodies. It is also noted that the figurescontained herein are not necessarily drawn to scale and that suchfigures have been provided for the purposes of discussion andillustration.

Generally, high resolution images of selected portions of a celestialbody's surface have become a product desired and used by governmentagencies, corporations, and individuals. For instance, many consumerproducts in common use today include images of the Earth's surface, suchas Google® Earth. Various types of remote sensing image collectionplatforms may be employed, including aircraft, earth-orbitingsatellites, and the like. In the case of a consumer digital camera, asone non-limiting example, an image sensor is generally arranged in anarea array (e.g., 3,000 rows of 3,000 pixels each, or 9,000,000 totalpixels) which collects the image area in a single “snapshot.” In thecase of satellite-based imaging, as another non-limiting example, the“push-broom scanning” principle is sometimes employed whereby each imagesensor includes a relatively small number of rows (e.g., a couple) of agreat number of pixels (e.g., 50,000 or more) in each row. Each row ofpixels may be scanned across the earth to build an image line by line,and the width of the image is the product of the number of pixels in therow times the pixel size or resolution (e.g., 50,000 pixels at 0.5 meterground resolution produces an image that is 25,000 meters wide). Thelength of the image is controlled by the scan duration (i.e. number oflines), which is typically settable for each image collected. Theresolution of overhead images varies depending on factors such as theparticular instrumentation utilized, the altitude of the satellite's (orother aircraft's) orbit, and the like.

Image collection platforms (e.g., aircraft, earth-orbiting satellites,etc.) may collect or acquire various types of imagery in one or moremanners. As one non-limiting example, image collection platforms mayperform panchromatic collection of scenes of a celestial body whichgenerally refers to the collection of image data across a single broadrange of wavelengths (e.g., all visible light, from near infrared (NIR)to near ultraviolet (NUV), etc.). As another non-limiting example, imagecollection platforms may additionally or alternatively capture imagedata within the visible light band and include respective filters toseparate the incoming light into red, green and blue portions. As afurther non-limiting example, image collections platforms mayadditionally or alternatively perform multispectral collection of scenesof a celestial body which generally refers to the collection of imagedata at multiple specific spectral bands across the electromagneticspectrum (e.g., within bands both inside and outside of the visiblelight range such as NIR, short wave infrared (SWIR), far infrared (FIR),etc.). For instance, a satellite may have one image sensor that issensitive to electromagnetic radiation across only a first spectral band(e.g., the visible light band, such as a wavelength range of about380-750 nm) in addition to one or more additional image sensors that aresensitive to electromagnetic radiation only across other spectral bands(e.g., NIR, 750-1400 nm; SWIR, 1400-3000 nm; etc.). Multi-spectralimaging may allow for the extraction of additional information from theradiance received at a satellite after being reflected from the Earth'ssurface (which may include atmospheric effects such as from aerosols,clouds, etc.).

Turning now to FIG. 1, a simplified block diagram of a system 100 thatmay be used to automatically characterize various types of compound inoverhead imagery (e.g., <1-10 m spatial resolution satellite image dataobtained by a number of heterogeneous platforms such as SPOT 2 and 5,CBERS 2B, RapidEye 2 and 4, IKONOS® 2, QuickBird 2, WorldView 1 and 2)is disclosed. Initially, the system 100 obtains one or more overheadimages of a geographic from any appropriate overhead image data sources104 and performs the automated generation 108 of a visual worddictionary from the overhead images as disclosed herein. The system 100then performs categorization 112 of compound structures in the overheadimage(s) using the visual word dictionary and generates resultant images116 that depict or convey one or more detected or characterized compoundstructure types in the overhead image(s).

Turning now to FIG. 2, a more detailed block diagram of an automatedsystem 200 that may be used to implement the automated visual worddictionary generation 108 and compound structure categorization orcharacterization 112 in FIG. 1 is presented. Although depicted as asingle device (e.g., server, workstation, laptop, desktop, mobiledevice, and/or other computing device), one or more functionalities,processes or modules of the system 200 may be allocated or divided amonga plurality of machines, devices and/or processes which may or may notbe embodied in a single housing. In one arrangement, functionalities ofthe server 200 may be embodied in any appropriate cloud or distributedcomputing environment.

Broadly, the system 200 may include memory 204 (e.g., one or more RAM orother volatile memory modules, etc.), a processing engine or unit 208(e.g., one or more CPUs, processors, processor cores, or other similarpieces of hardware) for executing computer readable instructions fromthe memory 204, storage 212 (e.g., one or more magnetic disks or othernon-volatile memory modules or non-transitory computer-readablemediums), and/or a number of other components 216 (e.g., input devicessuch as a keyboard and mouse, output devices such as a display andspeakers, and the like), all of which may be appropriatelyinterconnected by one or more buses 220. While not shown, the system 200may include any appropriate number and arrangement of interfaces thatfacilitate interconnection between the one or more buses 220 and thevarious components of the system 200 as well as with other devices(e.g., network interfaces to allow for communication between the system200 and other devices over one or more networks, such as LANs, WANs, theInternet, etc.).

The system 200 may appropriately retrieve any one or more overheadimages 224 of a geographic area (e.g., from one or more overhead imagedata sources 104 of FIG. 1, such as the WV2 multispectral image of Riode Janeiro illustrated in FIG. 13) and store the same in any appropriateform in storage 212 (e.g., such as in one or more databases andmanageable by any appropriate database management system (DBMS) to allowthe definition, creation, querying, update, and administration of thedatabases). The processing engine 208 may execute a DBMS or the like toretrieve and load the one or more overhead images 224 into the memory204 for manipulation by a number of engines or modules of the system 200as will be discussed in more detail below.

As shown, the system 200 may include a “construction” engine 228 that isbroadly configured to construct first and second hierarchical datastructures from input overhead images of a geographic area to obtain adictionary of visual words, a “categorization” engine 232 that isbroadly configured to categorize a compound structure type (e.g.,residential, urban, orchards, etc.) of the input overhead images basedon the visual dictionary, and a “mapping” engine 236 that is broadlyconfigured to generate one or more resultant images of the geographicarea depicting the categorized compound structure types in thegeographic area. Each of the engines (and/or other engines, modules,logic, etc. disclosed and/or encompassed herein) may be in the form ofone or more sets of computer-readable instructions for execution by theprocessing unit 208 and that may be manipulated by users in anyappropriate manner to perform automated characterization of compoundstructures for presentation on a display (not shown). In this regard,the combination of the processing unit 208, memory 204, and/or storage212 (i.e., machine/hardware components) on the one hand and the variousengines/modules disclosed herein in one embodiment create a new machinethat becomes a special purpose computer once it is programmed to performparticular functions of the characterization utilities disclosed herein(e.g., pursuant to instructions from program software).

In one arrangement, any appropriate portal in communication with thevarious engines may run on the system 200 and be accessible by users(e.g., via any appropriate browser) to access the functionalities of thesystem 200. While the various engines have been depicted in FIG. 2 asbeing separate or distinct modules, it is to be understood that thefunctionalities or instructions of two or more of the engines mayactually be integrated as part of the same computer-readable instructionset and that the engines have been depicted in the manner shown in FIG.2 merely to highlight various functionalities of the system 200.Furthermore, while the engines have been illustrated as being residentwithin the (e.g., volatile) memory 204 (e.g., for execution by theprocessing engine 208), it is to be understood that the engines may bestored in (e.g., non-volatile) storage 212 (and/or other non-volatilestorage in communication with the system 200) and loaded into the memory204 as appropriate.

To facilitate the reader's understanding of the various engines of thesystem 200, additional reference is now made to FIG. 8 which illustratesa method 800 for use in performing the automated characterizationprocesses disclosed herein. Reference is also now made to FIGS. 3, 4,and 5 which respectively illustrate a first hierarchical data structure,a space partitioning procedure for use in hierarchically arranging imagedescriptors of connected components of FIG. 3 into a feature space, anda resultant second hierarchical data structure constructed by thepartitioning procedure illustrated in FIG. 4. While specific steps (andorders of steps) of the method 800 (as well as other methods disclosedherein) have been illustrated and will be discussed, other methods(including more, fewer or different steps than those illustrated)consistent with the teachings presented herein are also envisioned andencompassed within the present disclosure.

The method 800 may begin by decomposing 804 the pixels of one or moreinput overhead images of a particular geographic area into a pluralityof hierarchically connected components (e.g., groups of pixels thatcollectively define the input image(s) as whole) of a first hierarchicaldata structure. With reference to FIGS. 2 and 3, for instance, theconstruction engine 228 of the automated extraction system 200 mayreceive one or more input overhead images 304 (e.g., overhead images224) of a particular geographic area (e.g., WorldView 2 multispectralimage acquired over a portion of Rio de Janeiro, Brazil illustrated inFIG. 13) and break the input image(s) 304 down into a plurality ofconnected components 308/244 (e.g., nodes) of a first hierarchical datastructure 240/300 (e.g., Min-Tree, Max-Tree, etc.).

In the case of the first hierarchical data structure 300 being in theform of a Max-Tree, for example, the first hierarchical data structure300 may be a rooted, uni-directed tree with its leaf components 312(e.g., leaf nodes) corresponding to a regional maxima of the inputoverhead image(s) 304 and its root component 316 (e.g., root node)corresponding to a single connected component defining the background ofthe input overhead image(s). For instance, the overhead image(s) 304 maybe thresholded at each of a number of intensity or grayscale levels toprovide as many binary images as the number of grey levels, where eachbinary image may then be analyzed to derive its connected components308. At least one morphological attribute filter (e.g., an edgepreserving operator) may progressively accept (or reject) connectedcomponents 308 of the tree based on some attribute criterion.

For instance, image descriptor or attribute openings and closings may beused whereby the intensity value (e.g., grayscale) of each connectedcomponent 308 is assessed at each of a number of progressivelyincreasing and decreasing predetermined threshold intensity or grayscalelevels and the component is rejected if the intensity value is nothigher or lower than each respective progressively increasing ordecreasing predetermined intensity level. The hierarchical ordering ofthe components 308 may encode the nesting of peak components (e.g.,pixels with intensities greater than a level “h”) with respect to thegray-scale range of the input image(s) 304. Each component 308 maygenerally point to its parent (i.e., the first ancestor node 308 belowthe given level) while the root component 316 points to itself. While aMax-Tree has been discussed, it is to be understood that many othermanners of partitioning the one or more overhead images 224 into aplurality of connected components collectively representing the pixelsof the overhead image(s) 224 are also encompassed herein. For instance,a minimum spanning tree defined on the image edges or the tree of shapes(e.g., a merge of a Max-Tree with a Min-Tree) may be utilized wherebypixels of the input overhead image(s) may be hierarchically grouped intocomponents. Such a minimum spanning tree is disclosed in “FastComputation of a Contrast-Invariant Image Representation” by P. Monasseand F. Guichard as published in Volume 9, Issue 5 of IEEE Transactionson Image Processing in 2000 (doi: 10.1109/83.841532), the entirety ofwhich is incorporated herein by reference as if set forth in full.

The method 800 of FIG. 8 also includes deriving 808 one or more imagedescriptors 320/248 for each of the connected components 308. Forinstance, spectral descriptors may be derived by determining theindividual satellite image band averages in each component 308 whileshape descriptors may be derived such as area, eccentricity, imagemoments (Hu), and the like of the pixels represented by the component308. In one embodiment, each component may include a pointer to a datastructure that stores auxiliary data for the component, 308, where theconstruction engine 228 derives or otherwise determines one or moreimage descriptors 320 for each connected component 308/244 from suchauxiliary data. The first hierarchical data structure 300 may thus allowfor compact storage of the connected components 308 from all grey levelswhile having a limited computational complexity. In any case, the firsthierarchical data structure 300/240 may be appropriately stored inmemory 204 for quick retrieval during subsequent steps of the method800.

In the case of a multi-spectral image, for instance, the variousspectral bands may be fused into a single band in any appropriate mannerbefore the image is decomposed into the first hierarchical datastructure. For example, the spectral bands of an 8-band multi-spectralimage may be fused into a single band by way of the following built-up(BU) index which may be computed for each pixel of the input image:

${BU} = \frac{{RE} - {{NIR}\; 2}}{{RE} + {{NIR}\; 2}}$

where “RE” is the intensity of electromagnetic radiation received in thered edge band of the electromagnetic spectrum, and “NIR2” is theintensity of electromagnetic radiation received in the NIR2 band of theelectromagnetic spectrum. In this regard, the pixels of the 8-bandmultispectral image may be broken and arranged into a plurality ofhierarchical components based on the respective BU values of the pixels.

As another example, the spectral bands of a 4-band multi-spectral imagemay be fused into a single band by way of the following built-up (BU)index which may be computed for each pixel of the input image:

${BU} = \frac{R - {NIR}}{R + {NIR}}$

where “R” is the intensity of electromagnetic radiation received in thered band of the electromagnetic spectrum, and “NIR” is the intensity ofelectromagnetic radiation received in the NIR band of theelectromagnetic spectrum. In this regard, the pixels of the 4-bandmultispectral image may be broken and arranged into a plurality ofhierarchical components based on the respective BU values of the pixels.While the above BU indexes have been discussed, almost any meaningfulband ratio may be utilized such as Normalized Difference VegetationIndex (NDVI or (NIR-VIS)/(NIR+VIS), Normalized Difference Water Index(NDWI or (Green-NIR)/(Green+NIR)), and/or the like.

As shown in FIG. 8, the method 800 may also include constructing 812,with the derived image descriptors 320/248 of the connected components308/244, a second hierarchical data structure 252 that includes aplurality of hierarchically-arranged nodes, where each of the imagedescriptors 320/248 depends from one or more of the nodes, and where thesecond hierarchical data structure 252 represents an image descriptorspace of the first hierarchical data structure 240. Constructed secondhierarchical data structure(s) 252 may be appropriately stored in memory204 for quick retrieval during subsequent steps of the method 800.

In one arrangement, the second hierarchical data structure 252 may be inthe form of a KD-Tree and in this regard FIG. 4 illustrates a simplifiedKD-Tree-based space partitioning procedure 400 that may be used to buildthe second hierarchical data structure 252. It is to be understoodhowever that that other manners of hierarchically organizing the imagedescriptor space of the first hierarchical data structure 240 are alsoencompassed herein (e.g., image descriptor clustering based ondissimilarity measures or the like). As seen in FIG. 4, and in the caseof the construction engine 228 deriving first and second imagedescriptors (e.g., a shape descriptor and a spectral descriptor) foreach of the connected components 308/244 (see FIGS. 2-3), theconstruction engine 228 may initially dispose respective data points 405(e.g., each representing a fusion or concatenation of the first andsecond image descriptors) for each of the connected components 308/244at appropriate locations on an x, y coordinate system as shown in afirst step 404 of the procedure 400. As just one example, the x-axis maycorrespond to the shape of each component 308/244 (e.g., the shapecollectively represented by the image pixels making up the component,where for instance, the left side of the x-axis represents morecurvilinear shapes and right side represents more polygonal shapes) andthe y-axis may correspond to the color of each component 308/244 (e.g.,the color collectively represented by the image pixels making up thecomponent, where, for instance, the bottom of the x-axis representsdarker colors and the top represents brighter colors).

The construction engine 228 may then create a root node f₁ as shown in asecond step 408 by splitting the data points 405 into two groups with avertical line through the median x-coordinate of the data points 405. Asimilar procedure may then be performed to create child nodes f₂, f₃, asshown in a third step 412, only with respective horizontal lines throughthe respective median y-coordinates of the data points 405 on eitherside of the root node f₁. The splitting may then continue recursively tocreate leaf nodes f₄, f₅, f₆, f₇ as shown in a fourth step 416, whereone or more data points 405 (e.g., up to a maximum of “m” data points405 which may be appropriately designated in advance) may depend fromeach of the leaf nodes f₄, f₅, f₆, f₇.

FIG. 5 illustrates a simplified second hierarchical data structure 500built from the nodes f₁-f₇ and data points 405 illustrated in the fourthstep 416 of FIG. 4. As shown, the second hierarchical data structure 500includes a plurality of levels of nodes such as one level that includesnodes f₂, f₃, another level that includes leaf nodes f₄, f₅, f₆, f₇,etc. Generally, each data point 405 depends from one of the leaf nodesf₄, f₅, f₆, f₇ and each of the nodes f₁-f₇ collectively includes theimage descriptors of all nodes and/or data points 405 dependingtherefrom. As an example, node f₄ includes (e.g., as determined duringthe partitioning procedure 400 of FIG. 4) all the image descriptorsassociated with a first data point 504 that includes yellow circularstructures and a second data point 508 that includes red circularstructures. In this regard, node f₄ may generally represent brightcolored circular structures/portions. As another example, node f₇includes all the image descriptors associated with a third data point512 that includes gray triangular structures and a fourth data point 516that includes brown triangular structures. In this regard, node f₇ maygenerally represent dark colored polygonal structures/portions. While atwo-dimensional space partitioning procedure 400 is illustrated in eachof the steps of FIG. 4, it is to be understood that more complicatedspace partitioning procedures may be employed to accommodate more thantwo dimensions of image descriptors.

Turning back to FIG. 8, the method 800 includes identifying 816 aparticular node level in the second hierarchical data structure 500 andthen loading 820 an image descriptor category or classificationrepresented by each of the nodes at the particular node level into avisual dictionary as a respective plurality of visual words. Forinstance, assume that the node level including nodes f₄, f₅, f₆, f₇ isselected as the particular node level at which to create the visualdictionary. Also assume that node f₄ generally represents brightcircular structures/areas, node f₅ generally represents dark ovularstructures/areas, node f₆ generally represents bright rectangularstructures/areas, and that node f₇ generally represents dark triangularstructures/areas, respectively. In this case, the respective imagedescriptor types encompassed by each of the nodes f₄, f₅, f₆, f₇ wouldbecome a respective number of visual words 520. For instance, a firstvisual word 520 (“Visual Word₁”) would include image descriptorsencompassing bright circular structures/areas, a second visual word 520(“Visual Word₂”) would include image descriptors encompassing darkovular structures/areas, a third visual word 520 (“Visual Word₃”) wouldinclude image descriptors encompassing bright rectangularstructures/areas, and a second visual word 520 (“Visual Word₂”) wouldinclude image descriptors encompassing dark triangular structures/areas.

The visual words may then be loaded into a visual word dictionary foruse by the categorization engine 232 (e.g., see visual words 256 beingloaded into visual word dictionary 260 in FIG. 2). Selection ofdifferent node levels in the second hierarchical data structure 500/252creates additional or different visual words at varying levels ofgranularity. For instance, selecting node levels closer to the root nodef₁ such as the level including nodes f₂, f₃ would create two (e.g.,coarser) visual words respectively including curvilinear structures(e.g., regardless of color) and polygonal structures (e.g., regardlessof color) while selecting node levels (or data points 405) farther awayfrom the root node f₁ creates high numbers of finer-grained visualwords. Multiple dictionaries 260 including different combinations ofvisual words can be stored in storage 212 (see FIG. 2) for use incategorizing different types of compound structures at varyinggranularity levels as will be discussed in more detail below.

In this regard, reference is now made to FIG. 9 which illustrates a flowdiagram of a method 900 for characterizing (e.g., categorizing) compoundstructures in overhead images of a geographic area (e.g., WorldView 2multispectral image acquired over a portion of Rio de Janeiro, Brazilillustrated in FIG. 13) using a visual word dictionary, such as one ormore visual word dictionaries 260 created according to the method 800 ofFIG. 8. At 904, a particular type of compound structure to be identified(e.g., built-up, orchards, forests, etc.; finer granularities ofbuilt-up such as urban, residential, and informal settlements, etc.;and/or the like) in the one or more overhead images of the geographicarea decomposed in the method 800 of FIG. 8 may be selected. It may thenbe queried 905 whether a frequency or prevalence of at least some of thevisual words for the selected compound structure type (e.g., a compoundstructure type visual word signature) is known. In response to apositive answer to the query at 905, the method may proceed to overlay908 a grid of cells onto the overhead image(s) of the geographic areafor use in categorizing one or more compound structure types in thegeographic area as will be discussed in more detail below.

Before discussing step 908 and subsequent steps in more detail, themethod 900 may, in response to a negative answer to the query at 905,proceed to determine 906 the selected compound type visual wordfrequency/prevalence for use in the categorization of unknown portionsof the geographic area. Turning now to the method 1200 of FIG. 12, atleast one training sample or region of interest (ROI) in the overheadimage(s) of the geographic area that represents that selected compoundstructure type may be selected 1204. In the case of the overhead imageof Rio de Janeiro, Brazil illustrated in FIG. 13, for instance, FIG. 14illustrates the selection of first and second regions of interest 1404,1408 (e.g., via a user interface in any appropriate manner) representingselected compound structure types in the form of informal settlements(e.g., slums) in the overhead image of FIG. 13. As further examples,FIGS. 15 and 16 illustrate the selection of additional ROIs 1504, 1508and 1604, respectively, for other selected compound structure types(e.g., residential and industrial, respectively). The method 1200 maythen classify 1208 each connected component (e.g., connected components244/308 of FIGS. 2-3) resident within the selected ROI as one of thevisual words (e.g., visual words 256 of dictionary(ies) 260 in FIG. 2;visual words 520 in FIG. 5).

With brief reference now to the method 1100 of FIG. 11, the one or moreimage descriptors of each connected component may be initially compared1104 and/or otherwise analyzed in relation to the one or more imagedescriptors of each visual word. For instance, each of the first andsecond ROIs 1404, 1408 in FIG. 14 may include one or more connectedcomponents 1412 therein, each of which includes or is described by oneor more image descriptors (e.g., image descriptors 248 in FIGS. 2 and320 in FIG. 3). For each connected component, the method 1100 may theninclude ascertaining 1108 a smallest distance (e.g., via a dissimilarityanalysis or the like) between the image descriptor(s) of the connectedcomponent and the image descriptor(s) of each of the visual words andthen classifying 1112 the connected component as the visual wordassociated with the smallest ascertained distance. For example, assumethat the visual words included the four visual words 520 of FIG. 5. Inthe case where the image descriptor(s) of one of the connectedcomponents 1112 included bright green, somewhat square or rectangularstructures, for instance, the connected component 1112 may be classifiedas Visual Word₃ 520 of FIG. 5. A similar process may be performed toclassify the other connected components within the selected ROIs as oneof the visual words.

Returning to FIG. 12, the method 1200 may then include determining 1212the frequencies of the visual words within the one or more ROIscorresponding to the selected compound structure type, such as byanalyzing the visual word classification of the connected componentswithin the one or more ROIs for the selected compound structure type.For instance, a histogram including the frequencies of each of thevisual words within the one or more ROIs may be expressed in the form ofthe following vector x:

x _(i)=[a ₁(Visual Word₁), a ₂(Visual Word₂), a ₃(Visual Word₃), a₄(Visual Word₄), . . . a _(n)(Visual Word_(n))],

where each of the entries a₁, a₂, etc. corresponds to the frequency of arespective one of the visual words within the spatial extent of the oneor more ROIs.

For instance, each time one of the connected components 1412 within thefirst and second ROIs 1404, 1408 is classified as one of the visualwords, the frequency entry for the respective visual word may beincreased in the vector x for the selected compound structure type. Asimilar process may be performed to obtain visual word frequency vectorsx for other compound structure types, such as those corresponding to theROIs in FIGS. 15 and 16, where the considered visual words may be thesame as or different than those considered in relation to other compoundstructure type ROIs (e.g., from different visual dictionaries 260constructed from visual words 256). The categorization engine 232 maymaintain or at least have access to any appropriate data structures(e.g., in memory 204 and/or in storage 212) including compound structuretypes 264, respective training samples 268 (e.g., ROIs), visual words256 (and their respective frequencies), and/or the like.

With reference back to FIG. 9, the method 900 may include overlaying 908a grid having a plurality of adjacent cells (e.g., tiles, such assquare-shaped tiles) onto the overhead image of the geographic area,where compound structure type of each cell is to be categorized as willbe discussed in more detail below. With reference to FIG. 6, forinstance, a portion of one or more overhead images of a geographic area600 (e.g., the overhead image of Rio de Janeiro, Brazil illustrated inFIG. 13) is presented, where a portion of a grid 604 including aplurality of cells 608 ₁, 608 ₂, 608 ₃, 608 ₄ is disposed thereover. Thegrid 604 may include a number of parameters that define the size of thecells as well as the orientation of the cells relative to adjacentcells.

One parameter may be a width parameter w that generally defines a widthof each of the cells and may correspond to the spatial extent of theselected compound structure type or semantic (e.g., of step 904 of FIG.9) to increase the descriptive power of the visual words frequencieswithin the cells. For instance, the semantic or compound structure“destroyed building” may be best described by finding rubble pieces(e.g., identified by a particular combination of visual wordfrequencies) within a spatial extent which is about that of a buildingscale, such as 10 meters, while the semantic or compound structure“orchard” may be best described by finding trees within a spatial extentwhich has the dimension of a field greater than about 100 meters. Inthis case, a first grid could be created having cells with widthparameters w of 10 meters to identify destroyed buildings and a secondgrid could be created having cells with width parameters w of 100 metersto identify orchards. The first and second grids may both be overlaidonto the overhead image(s) of the geographic area to automaticallyand/or simultaneously detect destroyed buildings and orchards within theoverhead image(s) of the geographic area as disclosed herein.

Another parameter may be an overlap parameter r that generally defines avertical and/or horizontal displacement between successive or adjacentcells and thus controls the number of cells produced (e.g., where r maybe the same or different in the vertical and horizontal directions). Forinstance, the overlap parameter r may be defined between left edges ofadjacent cells, such as between the left edges of cells 608 ₁, and 608 ₃in FIG. 6. In this regard, a smaller overlap parameter r would result ina greater number of cells and thus a higher sampling resolution of thegeographic area 600. As another example, the overlap parameter r may bedefined between the left edge of cell 608 ₃ and the right edge of cell608 ₃ in FIG. 6 such that a smaller overlap parameter would result in areduced number of cells. In any event, the overlap parameter r may beselected to be a constant fraction of the width parameter w to providesufficient spatial sampling for the spatial representation of theconsidered or selected compound structure type. The categorizationengine 232 may maintain or at least have access to any appropriate datastructures (e.g., in memory 204 and/or in storage 212) including aplurality of grids 272 (e.g., each configured for detecting a differenttype of compound structure in the overhead image(s) 224 of thegeographic area), where each includes a plurality of cells 276 definedby one or more parameters (e.g., width parameters w, overlap parameterr, and/or the like). See FIG. 2.

Referring again to FIG. 9, the method 900 includes identifying 912visual words 260 from a dictionary 256 (see FIG. 2) corresponding to theselected compound structure type. Stated differently, the method 900 mayinclude identifying 912 visual words analyzed during the determiningstep 906 in relation to the selected compound structure type. In onearrangement, the method 900 may identify all of the visual wordsanalyzed during the determining step 906. In another arrangement,however, the method 900 may identify a subset I of all of the visualwords considered during the determining step 906 or a subset of all ofthe words in the dictionary. More particularly, many or most compoundstructure types can be described or identified by less than all of thevisual words 260 in the particular dictionary 256 under consideration.That is, the presence, absence, and/or prevalency of some visual wordsin a dictionary may be more important or revealing than are other visualwords as to whether or not a particular compounds structure type ispresent within or otherwise represented by a particular portion of ageographic area (e.g., within a particular cell of the geographic area).

In one arrangement, a feature selection technique may be used to inferthe visual word subset I by aiming to identify the relevant attribute(e.g., relevant visual words) for the particular compound structure typewhile limiting redundancy therebetween. Among the various featureselection paradigm, filtering methods may use a fast proxy measure toscore a feature subset. For instance, a common proxy measure may includethe Mutual Information and the Pearson correlation coefficient. A simplebut yet effective method may include ranking all of the visual wordfrequencies with respect to the classes based on their Pearson'scorrelation. The visual words which have an absolute correlation above agiven threshold may be retained to be part of the subset I.

In another arrangement, a fixed percentage of the highest frequencyvisual words in the one or more ROIs for the selected compound structuretype (from step 1212 of FIG. 12) may be retained and identified in step912 of FIG. 9. In the event that subsequent compound structure typecategorizations are inaccurate or otherwise imprecise, a user may beable to incrementally select greater and greater percentages of visualwords from the ROI to obtain a result of any desired accuracy with anyacceptable level of computational speed. In a further arrangement, anyvisual words whose respective frequencies only nominally change betweenROIs of different compound structure types may be not included in thevisual word subset I, or included but afforded less importance thanother visual words in the subset I (e.g., via respective weightingvalues).

Again with reference to FIG. 9, the method 900 may include determining916, for each cell 276 in the particular grid 272 for the selectedcompound structure type, a frequency distribution of the identified 912visual words (e.g., visual word subset I) and then categorizing 920 thecells 276 as either representing or not representing the selectedcompound structure type (e.g., based on the determined cell visual wordfrequency distributions and those of the selected compound structuretype). In one arrangement, the determining 916 may include classifyingall connected components 280 (see FIG. 2) that fall within eachrespective cell 276 in the particular grid 272 as one of the identified912 visual words (e.g., via the method 1100 of FIG. 11) and thendetermining the frequencies of each of the identified 912 visual wordswithin the cell (e.g., where a connected component covering half of acell would contribute more to its respective visual word's “frequency”within the cell than would another connected component covering only tenpercent of the cell).

In another arrangement, the determining 916 may include identifying 924a subset of connected components 244 in the grid 280 that are classifiedby a visual word of the identified 912 visual words and then identifying928 a subset of cells 276 in the grid 272 that fully contain at leastone of the identified 924 connected components 244 (e.g., rather thanconnected components that are cut into two parts by adjacent cells 276).That is, rather than necessarily considering all of the cells 276 in thegrid 272, only a subset of the cells 276 that fully contain a connectedcomponent classified by one of the identified 912 visual words may beconsidered. This arrangement advantageously reduces the computationalcomplexity of the categorization processes disclosed herein by avoidingapplication of the visual word frequency analysis to connectedcomponents not classified by one of the identified 912 visual words inthe first place as well as limiting what may be arbitrary cuts throughconnected components 244. In one variation, the identifying step 924 maybe limited to lower level sets of the connected components 244 of thefirst hierarchical data structure 300 as such connected components maybe more likely to identify “salient” or important structures/portions inthe geographic area.

For all cells identified at 928, the visual word frequency distributionof each such cell may be determined 932. For each identified cell 928for instance, a histogram including the frequencies of each of theidentified 912 visual words within the cell may be expressed in the formof the following vector y:

y _(i)=[b ₁(Identified Visual Word₁), b ₂(Identified Visual Word₂), . .. b _(n)(Identified Visual Word_(n))],

where each of the entries b₁, b₂, etc. corresponds to the frequency of arespective one of the identified 912 visual words within the spatialextent of the respective identified 928 cell.

With reference to FIG. 6, for instance, a plurality of connectedcomponents 612, 616, 620, 624, 628 (e.g., connected components 244) arerespectively disposed over the geographic area 600 and have variousshape and spectral parameters as respectively depicted by the outline ofthe connected components and the hashing of the connected components.Assume that all of the connected components 612, 616, 620, 624, 628 wereidentified 924 as being classified by one of the identified 912 visualwords for the selected 904 compound structure type (where any additionalconnected components not classified by one of the identified 912 visualwords have not been shown in the interest of clarity). However, assumethat only cells 608 ₁, 608 ₂, 608 ₃ are identified 928 as containing(e.g., fully) one of the identified 924 connected components.

In this regard, the visual word frequency vectors y may be automaticallydetermined 932 for each of cells 608 ₁, 608 ₂, 608 ₃ using the connectedcomponents contained fully therein as well as visual words by which thecomponents have been classified. For instance, assume that Visual Word₁,Visual Word₂ and Visual Word₄ in FIG. 5 were identified 912 asrepresenting the selected 904 compound structure type. In this case, thevisual word frequency vector for each of cells 608 ₁, 608 ₂, 608 ₃ maybe expressed as follows:

y _(i) =[b ₁(Visual Word₁), b ₂(Visual Word₂), b ₄(Visual Word₄)]

In the case of cell 608 ₂, for instance, its vector y may be determinedusing connected components 616 and 624. For instance, assume thatconnected component 616 was classified as Visual Word₄ of FIG. 5 andthat connected component 624 was classified as Visual Word₁ of FIG. 5.Here, the frequency vector for cell 608 ₂ may be expressed as follows:

y ₆₀₈₂=[12, 0, 30],

where one or more connected components (in this case one connectedcomponent) classified as Visual Word₁ have a frequency of about 12% incell 608 ₂ (e.g., consume an area that is about 12% of the area of cell608 ₂), no connected components are classified as Visual Word₂, and oneor more connected components (in this case one connected component)classified as Visual Word₄ have a frequency of about 30% in cell 608 ₂(e.g., consume an area that is about 30% of the area of cell 608 ₂).Vectors y may be similarly determined for cells 608 ₁ and 608 ₃. In thisexample, it can be seen how connected component 624 may contribute toeach of the visual word frequency vectors of cells 608 ₂, 608 ₃.

Again with reference to FIGS. 2 and 9, the method 900 may then includecategorizing 920 each cell 276 in the grid 272 as representing or notrepresenting the selected 904 compound structure type. As shown, themethod 900 may include analyzing 936 the determined visual wordfrequencies (e.g., the respective frequency vectors) of the identified928 cells in relation to the visual word frequencies of the selectedcompound structure type (e.g., those determined at step 1212 in FIG.12). In one arrangement, the categorization engine 232 may compare eachrespective visual word frequency of each of the identified cells 928 tothe corresponding visual word frequency of the selected compoundstructure type to determine whether the two frequencies are “closeenough” (e.g., whether a distance between two frequencies is smallenough). Strictly for purposes of example, and in the event that thefrequency vector of the identified 912 visual words for the selectedcompound structure type included the entries 30, 10, 0 (corresponding toVisual Word₁, Visual Word₂ and Visual Word₄ in FIG. 5) while that ofcell 608 ₂ included the entries 12, 0, 30 as discussed above (strictlyfor example), the categorization engine 232 may determine that cell 608₂ does not represent the selected compound structure type as each entryin the vector for cell 608 ₂ may be determined to not be close enough tothe respective entry in the vector for the selected compound structuretype (e.g., via any appropriate dissimilarity analysis or the like).However, entries of 25, 5, 5 in the frequency vector for cell 608 ₂ maybe determined to be close enough and thus categorize cell 608 ₂ asrepresenting the selected compound structure type (e.g., urban, orchard,field, etc.).

In another arrangement, the categorization engine 232 may train anyappropriate linear classifier to make a decision about whether or notthe cells 276 in the grid 272 are to be categorized as the selectedcompound structure type. Turning now to FIG. 10, a categorization method1000 may include assigning 1004 weighting values to each of theidentified 912 visual words (from FIG. 9) to convey a relativeimportance each visual word has in determining whether or not aparticular cell falls within the selected 904 compound structure type.With reference back to the method 1200 of FIG. 12, for instance, theselecting 1204 step may additionally include selecting ROIs that do notrepresent the selected compound structure type (i.e., negative examplesof the selected compound structure type) and then additionallyperforming the classifying 1208 and determining 1212 steps in relationto the negative ROI examples. In one arrangement, the visual wordfrequency vectors x of all of the ROIs may need to be normalized as theROIs may be of different shapes and/or sizes. For instance, thefrequency vectors x can be normalized by dividing each visual wordfrequency vector by the area of its respective ROI.

The normalized visual word frequency vectors can then be associated withor labeled as positive or negative examples of the selected compoundstructure type (where any ROIs labeled as negative examples of theselected compound structure type may also be positive examples of otherselected compound structure types). The categorization engine 232 maythen use the normalized visual word frequency vectors to train a linearclassifier to output a weighting value for each of the visual words thatconveys a relative importance the visual word has in determining whetheror not a particular cell 276 falls within the selected compoundstructure type, and output a threshold to be used to categorize thecells 276 as falling within or not falling within the compound structuretype as discussed below.

For instance, a weighting vector of the weighting values may beexpressed as follows:

w _(i)=[c ₁(Visual Word₁), c ₂(Visual Word₂), . . . c_(n)(VisualWord_(n))],

where each of the entries c₁, c₂, etc. corresponds to the weightingvalue or factor corresponding to each of the visual words.

The method 1000 of FIG. 10 may then include obtaining 1008, for eachcell, a product of each visual word frequency and its respectiveweighting value, adding 1012 the products for each cell, and thenquerying 1016 whether each sum is equal to or above a threshold t. Acell may be categorized 1020 as the selected compound structure type inresponse to a positive answer to the query 1016 and may be categorized1024 as not (e.g., other than) the selected compound structure type inresponse to a positive answer to the query 1016.

In the case of cell 608 ₂ of FIG. 6, for instance, the steps 1008, 1012and query 1016 may be expressed as follows:

${{If}\;\begin{bmatrix}{b\; 1\; \left( {{Visual}\mspace{14mu} {Word}\; 1} \right)} & {b\; 2\; \left( {{Visual}\mspace{14mu} {Word}\; 2} \right)} & {b\; 4\; \left( {{Visual}\mspace{14mu} {Word}\; 4} \right)}\end{bmatrix}} \times {\quad{\begin{bmatrix}{c\; 1\; \left( {{Visual}\mspace{14mu} {Word}\; 1} \right)} \\{c\; 2\; \left( {{Visual}\mspace{14mu} {Word}\; 2} \right)} \\{c\; 4\; \left( {{Visual}\mspace{14mu} {Word}\; 4} \right)}\end{bmatrix}{\quad{{\geq t},{{{{then}\mspace{14mu} {cell}} = {1\; \left( {{selected}\mspace{14mu} {compound}\mspace{14mu} {structure}\mspace{14mu} {type}} \right)}};{{{Otherwise}\mspace{14mu} {cell}} = 0\; \left( {{not}\mspace{14mu} {selected}\mspace{14mu} {compound}\mspace{14mu} {structure}\mspace{14mu} {type}} \right)}}}}}}$

Stated differently, the dot product of a) the visual word frequencyvector of a cell (where the entries correspond to the identified 912words, such as a subset of all visual words as discussed above) and b)the weighting vector that includes weighting value entries correspondingto the visual words in the visual word frequency vector may be obtainedand then compared to the threshold t to determine whether or not tocategorize the cell as one of two classes: a first class that includesthe selected compound structure type or a second class that includes allother compound structure types. While the expression has been discussedin the context of categorizing the cell as falling within the selectedcompound structure when the dot product is equal to or above thethreshold, other embodiments envision doing so when the dot product isbelow the threshold, or equal to or below the threshold. A similarprocess may be performed to categorize other of the identified 928 cellsas the selected compound structure type or not the selected compoundstructure type.

Returning to FIG. 9, the method 900 may, for any cells not identified at928 as containing at least one of the identified 924 connectedcomponents (e.g., cell 608 ₄ in FIG. 6), categorize 940 any portions ofsuch cells that are not overlapped by other cells categorized as theselected compound structure type as falling within not the selectedcompound structure type. With reference to FIG. 6, for instance, cell608 ₄ does not contain any of the identified 924 components 612, 616,620, 624, 628. However, cell 608 ₄ is partially overlapped by cell 608 ₂which includes connected components 612, 624. In this regard, theportion 609 of cell 608 ₄ not overlapped by cell 608 ₂ may beautomatically categorized as not falling within the selected compoundstructure type. In one arrangement, the remaining portion 610 of cell608 ₄ may be automatically categorized in the same manner as cell 608 ₂.In another arrangement, the remaining portion 610 of cell 608 ₄ may becategorized according to any appropriate logic or algorithm that factorsin contributions from both of cells 608 ₂ and 608 ₄ (as portion 610 isalso an overlapping portion of cells 608 ₂ and 608 ₄).

The categorized cells 276 may be used by the mapping engine 236 toprepare one or more resultant images 280 (e.g., of the same spatialextent as the one or more input overhead images) that depict theselected compound structure type at appropriate locations in thegeographic area. See FIG. 2. For instance, any pixels of the one or moreinput overhead images falling within a cell categorized as the selectedcompound structure type may be labeled as the selected compoundstructure type while all other pixels may not be labeled as the selectedcompound structure type. Those pixels labeled as representing theselected compound structure type may be appropriately colored, shaded,etc. to depict that such pixels represent the selected compoundstructure type.

The methods in FIGS. 9-12 may be performed with respect to otherselected compound structure types with the same or differentdictionaries 256 of visual words 260, the same or different identified912 visual words (e.g., different subsets of visual words), with thesame or different grids having various cell parameters (e.g., same ordifferent width and overlap parameters, w, r), etc. to categorize othercells of other grids, where each query for each respective selectedcompound structure type is a “one v. rest” query (i.e., does each cellfall within the selected compound structure type class or all othercompound structure type class?). The various queries may be merged inany appropriate manner to generate a resultant image that depictsvarious compound structure types within a geographic area. For instance,see FIG. 20 which presents an image of Rio de Janeiro, Brazil of thesame spatial extent of that in FIG. 13 and that depicts informalsettlements (e.g., slums), residential patterns, and urban patterns inblue, green and red, respectively.

In one arrangement, the mapping engine 236 may maintain any appropriatedata structure for each pixel of the one or more overhead images thatkeeps track of the categorization of each cell within which the pixel isdisposed. For instance, assume that three different queries were run onthe overhead image of FIG. 13 for informal settlements, residentialpatterns, and urban patterns, respectively. Also assume that after eachquery, the data structure of a particular pixel includes entries of 2,0, 0 indicating that the pixel had fallen within cells categorized as aninformal settlement during the informal settlement query but had notfallen within a cell categorized as either a residential or urbanpattern. Thus, the particular pixel may be categorized as an informalsettlement in the resultant image 280. However, assume now that the datastructure of a particular pixel includes entries of 1, 1, 1 indicatingthat the pixel had fallen within a cell categorized as an informalsettlement during the informal settlement query, a cell categorized asresidential during the residential query, and a cell categorized asurban during the urban query. In this case, the mapping engine 236 mayrepresent the pixel as a combination of blue, green and red; as adifferent color to indicate that it was not determined how the pixelshould be categorized; or the like.

In one arrangement, the categorization engine 232 may maintain one ormore inverted files 284 of connected components 244 indexed according tothe particular visual word 256 by which the connected components areclassified, store the inverted files 284 in storage 212, and thenextract and load only those portions 288 needed in memory 204 during thecategorization process (steps 912 and 924 in FIG. 9). The inverted filesmay allow for the compact (e.g., compressed) storage of the compoundstructure representation as well fast queries with a limited number ofvisual words. A different inverted file may be maintained for eachvisual word dictionary 260.

For instance, FIG. 7 illustrates a schematic diagram of an inverted file700 built with the visual words 520 of FIG. 5 and the connectedcomponents 308 of FIG. 3. As shown, the inverted file 700 includes aplurality of rows 704 each indexed according to one of the visual words520 and each including a list of the components 308 classified by thevisual word by which the row 704 is indexed (e.g., via the method 1100of FIG. 11). For example, each connected component 308 in the invertedfile 700 may be identified by a “bounding box” represented by a numberof functions or coordinates that provide the spatial extent of theconnected component 308 within the overhead image(s) of the geographicarea and may include any other appropriate data or metadata for theconnected component 308. In relation to example provided above in whichthe categorization engine 232 identifies Visual Word₁, Visual Word₂ andVisual Word₄ in FIG. 5 as representing the selected 904 compoundstructure type, the categorization engine 232 may automatically identify924 the subset of connected components classified by the identifiedvisual words by extracting the rows 704 of connected components 308 inthe inverted file 700 indexed by Visual Word₁, Visual Word₂ and VisualWord₄ and loading the extracted rows into memory 204 for use inperforming the identifying step 928, determining step 932, etc.

In one arrangement, only the connected components 308 representingsalient structures need to be storage in the inverted file 700 which mayonly be a fraction of the total number of pixels. Furthermore, the sizeof the inverted file 700 need not necessarily depend on any cellparameters (e.g., the width and/or overlap parameters w, r) and maydepend uniquely on the number of connected components classified by theparticular visual words to make the inverted file adaptable to almostany choice of w and r as well as almost any choice of sub-clusteringcoming from the second hierarchical clustering structure (to create thevisual words).

Non-Limiting Example:

A WorldView₂ image acquired in July 2011 that is 8774×33975 pixels andthat partially covers the city of Rio de Janeiro, Brazil is acquired.See FIG. 13. Patterns making up the following three classes of compoundstructure types in the city are analyzed: informal settlements,residential and industrial. Two examples of each class plus anotherclass gathering the water, grassland and forest are provided fortraining the system. See FIG. 12. The examples (e.g., ROIs) provided forinformal settlements, residential and industrial are provided in FIGS.14-16. For instance, the informal settlements are made of small groundshelters that are very close to each other without any regular spatialarrangements. The residential parts contain houses and small buildingswhich are separated by roads and trees while highlighting at least somespatial arrangements. The industrial parts contain large buildings thatare separated by large roads and that exhibit strong linear features ontheir roofs while being surrounded by many vehicles.

In this experiment, 512 visual words are generated. During the learningphase, the typical compound structure scale is estimated to be around100 meters which thus may define the width parameter of the tiles (e.g.,cells) of the grid. Given an overlap of 80 meters between neighboringtiles, query results having a resolution of 20 meters are obtained. Inthis example, the representation, which is stored on 5% of the imagestorage space, allows the interactive classification of the equivalentof 877×3397 (≈3 million) tiles having each a distribution of 512 values(corresponding to the 512 visual words). During the learning stage, athreshold of 0.4 on the Pearson's correlation such that between 10% and20% of the 512 visual words are retained for each query. FIGS. 17a, 18a,and 19a illustrate portions of informal settlements, residential parts,and industrial parts of the overhead image of FIG. 13, respectively.FIGS. 17b, 18b, and 19b illustrate a plurality of different combinationsof connected components and their respective visual words over theinformal settlements, residential parts and industrial parts of FIGS.17a, 18a, and 19a , respectively. The resulting “one v. rest” queries(e.g., informal settlements v. everything else, residential v.everything else, etc.) are fused into a single classification shown inFIG. 20. The categorization of input overhead images as disclosed hereinmay be used in numerous contexts such as in assessing populationdensities, qualities of life, vulnerability factors, disaster risks,sufficiency of civil infrastructures, economic growth, poverty levels,event monitoring and evolution, and the like.

It will be readily appreciated that many deviations and/or additions maybe made from or to the specific embodiments disclosed in thespecification without departing from the spirit and scope of theinvention. Embodiments disclosed herein can be implemented as one ormore computer program products, i.e., one or more modules of computerprogram instructions encoded on a computer-readable medium for executionby, or to control the operation of, data processing apparatus. Forexample, the logic or software of the construction engine 228,classification engine 232 and mapping engine 236 responsible for thevarious functionalities disclosed herein may be provided in suchcomputer-readable medium of the automated categorization system 200 andexecuted by the processor 208 as appropriate. The computer-readablemedium can be a machine-readable storage device, a machine-readablestorage substrate, a non-volatile memory device, a composition of matteraffecting a machine-readable propagated signal, or a combination of oneor more of them. In this regard, the system 200 may encompass one ormore apparatuses, devices, and machines for processing data, includingby way of example a programmable processor, a computer, or multipleprocessors or computers. In addition to hardware, the system 200 mayinclude code that creates an execution environment for the computerprogram in question, e.g., code that constitutes processor firmware, aprotocol stack, a database management system, an operating system, or acombination of one or more of them.

A computer program (also known as a program, software, softwareapplication, script, or code) used to provide any of the functionalitiesdescribed herein (e.g., construction of the first and secondhierarchical data structures and the like) can be written in anyappropriate form of programming language including compiled orinterpreted languages, and it can be deployed in any form, including asa stand-alone program or as a module, component, subroutine, or otherunit suitable for use in a computing environment. A computer programdoes not necessarily correspond to a file in a file system. A programcan be stored in a portion of a file that holds other programs or data(e.g., one or more scripts stored in a markup language document), in asingle file dedicated to the program in question, or in multiplecoordinated files (e.g., files that store one or more modules,sub-programs, or portions of code). A computer program can be deployedto be executed on one computer or on multiple computers that are locatedat one site or distributed across multiple sites and interconnected by acommunication network.

The processes and logic flows described in this specification can beperformed by one or more programmable processors executing one or morecomputer programs to perform functions by operating on input data andgenerating output. The processes and logic flows can also be performedby, and apparatus can also be implemented as, special purpose logiccircuitry, e.g., an FPGA (field programmable gate array) or an ASIC(application-specific integrated circuit). Processors suitable for theexecution of a computer program may include, by way of example, bothgeneral and special purpose microprocessors, and any one or moreprocessors of any kind of digital computer. Generally, a processor willreceive instructions and data from a read-only memory or a random accessmemory or both. Generally, the elements of a computer are one or moreprocessors for performing instructions and one or more memory devicesfor storing instructions and data. The techniques described herein maybe implemented by a computer system configured to provide thefunctionality described.

While this specification contains many specifics, these should not beconstrued as limitations on the scope of the disclosure or of what maybe claimed, but rather as descriptions of features specific toparticular embodiments of the disclosure. Furthermore, certain featuresthat are described in this specification in the context of separateembodiments can also be implemented in combination in a singleembodiment. Conversely, various features that are described in thecontext of a single embodiment can also be implemented in multipleembodiments separately or in any suitable subcombination. Moreover,although features may be described above as acting in certaincombinations and even initially claimed as such, one or more featuresfrom a claimed combination can in some cases be excised from thecombination, and the claimed combination may be directed to asubcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings in a particularorder, this should not be understood as requiring that such operationsbe performed in the particular order shown or in sequential order, orthat all illustrated operations be performed, to achieve desirableresults. In certain circumstances, multitasking and/or parallelprocessing may be advantageous. Moreover, the separation of varioussystem components in the embodiments described above should not beunderstood as requiring such separation in all embodiments, and itshould be understood that the described program components and systemscan generally be integrated together in a single software and/orhardware product or packaged into multiple software and/or hardwareproducts.

The above described embodiments including the preferred embodiment andthe best mode of the invention known to the inventor at the time offiling are given by illustrative examples only.

What is claimed is:
 1. A method for use in characterizing areas ofinterest in overhead imagery, comprising: organizing, using a processor,a plurality of pixels of at least one input overhead image into aplurality of components of a first hierarchical data structure, whereinthe input image is associated with a geographic area; deriving, usingthe processor, at least one image descriptor for each of the components;constructing, using the processor with at least some of the imagedescriptors, a second hierarchical data structure that includes aplurality of hierarchically-arranged levels of nodes, wherein the nodesof the second hierarchical data structure are arranged into a pluralityof leaf paths that each extend from a root node to one of a plurality ofleaf nodes, wherein each of the image descriptors depends from one ofthe plurality of leaf nodes, wherein at least one node at a particularone of the levels is a respective visual word, and wherein all of thevisual words at the particular one of the levels collectively comprise avisual dictionary; determining, for each cell of at least some cells ofa grid of cells disposed over the geographic area, a frequency of eachof one or more of the visual words in the cell; and generating, for thecell, a vector that includes a plurality of entries, wherein each entryincludes a value that represents a respective one of the frequencies. 2.The method of claim 1, further including: for each cell of the leastsome of the cells, using the cell vector to categorize a type ofcompound structure found within the cell.
 3. The method of claim 2,wherein the using includes comparing the cell vector to each of aplurality of different compound structure vectors.
 4. The method ofclaim 1, wherein the determining step includes: identifying at least oneof the plurality of components within the cell; ascertaining a smallestdistance between the at least one image descriptor of the identifiedcomponent and each of the at least one image descriptor depending fromeach of a plurality of the visual words; and classifying, based on theascertaining step, the component as a particular one of the visual wordsassociated with the smallest distance, wherein the classifying stepcontributes to the frequency of the particular one of the visual wordsin the cell vector.
 5. The method of claim 4, wherein the determiningstep further includes: identifying another of the plurality ofcomponents within the cell; ascertaining a smallest distance between theat least one image descriptor of the identified another component andeach of the at least one image descriptor depending from each of theplurality of the visual words; and classifying, based on theascertaining step, the another component as another particular one ofthe visual words associated with the smallest distance, wherein theclassifying step contributes to the frequency of the another particularone of the visual words in the cell vector.
 6. The method of claim 1,wherein the grid of cells are defined by a width parameter that setsforth a width of each of the cells and an overlap parameter that setsforth a displacement between adjacent cells of the grid of cells.
 7. Themethod of claim 6, wherein the grid of cells is a first grid, whereinthe method further includes: generating, for each cell of a plurality ofcells of a second grid of the geographic area, a vector that includes aplurality of entries, wherein each entry includes a value thatrepresents a respective frequency of one of the visual words in thecell.
 8. The method of claim 7, wherein one of the entries of the cellvectors of the first grid includes a value that represents a frequencyof a first of the visual words in the cell, wherein one of the entriesof the cell vectors of the second grid includes a value that representsa frequency of a second of the visual words in the cell, and wherein thefirst and second visual words are different.
 9. The method of claim 7,further including: using the cells vectors of the first grid todetermine whether or not the respective cells of the first gridrepresents a first area of interest; and using the cells vectors of thesecond grid to determine whether or not the respective cells of thesecond grid represents a second area of interest, wherein the first andsecond areas of interest are different.
 10. The method of claim 1,further including: identifying a subset of visual words of the visualdictionary that corresponds to an area of interest in the input overheadimage; identifying a subset of the plurality of components classified byone of the visual words in the subset of visual words; and identifying asubset of the plurality of cells that each fully contain at least one ofthe components in the subset of components, wherein the determining andgenerating steps are performed only with respect to the subset of visualwords and the subset of cells.
 11. The method of claim 10, wherein eachof at least some of the components in the subset of components is fullycontained within at least two of the cells.
 12. The method of claim 10,further including: assigning respective weighting values to each of thevisual words in the subset of visual words, wherein each weighting valuerepresents a degree to which the respective visual word indicates anarea of interest; for each cell of the subset of cells: obtaining, foreach entry in its cell vector, a product of the entry and acorresponding one of the weighting values to obtain a plurality ofproducts; summing the plurality of products to obtain a sum; identifyingthe cell as the area of interest if the sum has one of i) reached orexceeded a threshold and b) not reached the threshold; and identifyingthe cells as not the area of interest of the sum has the other of i)reached or exceeded the threshold and b) not reached the threshold. 13.The method of claim 10, further including: classifying each of theplurality of components as one of the plurality of visual words;storing, in a storage device, each of the plurality of components in adata structure that is indexed according to the plurality of visualwords; extracting the components indexed according to the subset ofvisual words, wherein the extracted components are the subset ofcomponents; and loading the extracted components and subset of visualwords into memory, wherein the determining and generating steps areperformed with respect to the extracted components and subset of visualwords as loaded into memory.
 14. The method of claim 1, furtherincluding: classifying each of the plurality of components as one of theplurality of visual words; storing, in a storage device, each of theplurality of components in a data structure that is indexed according tothe plurality of visual words.
 15. The method of claim 1, wherein thevisual dictionary is a first visual dictionary, and wherein all of thevisual words at another particular one of the levels collectivelycomprise a second visual dictionary at least partially different thanthe first visual dictionary.
 16. A method of characterizing areas ofinterest in overhead imagery, comprising: organizing, with a processor,an input overhead image associated with a geographic area into aplurality of components of a first hierarchical data structure;clustering, with the processor, a plurality of image descriptors derivedfrom the plurality of components into a plurality of visual words of avisual dictionary; categorizing, with the processor, each of at leastsome of a plurality of cells of the geographic area as one or more areasof interest based on one or more respective ones of the visual words inthe cell.
 17. The method of claim 16, further including: determining,for each of at least some cells of the plurality of cells of thegeographic area, a frequency of each of one or more of the visual wordsin the cell; generating, for each of the cells, a vector that includes aplurality of entries, wherein each entry includes a value thatrepresents a respective one of the visual word frequencies; andcharacterizing a type of one or more portions of interest within each ofthe plurality of cells based on the respective cell vector.
 18. Themethod of claim 17, wherein the characterizing step utilizes less thanall of the entries in the vector.
 19. The method of claim 17, furtherincluding: mapping the plurality of cells into a resultant image that isassociated with the geographic area, wherein each cell in the resultantimage depicts its respective characterized type.
 20. The method of claim17, further comprising: selecting a sample of one of the areas ofinterest in the input overhead image; determining, for the sample, afrequency of each of one or more of the visual words in the sample; andgenerating, for the sample, a vector that includes a plurality ofentries, wherein each entry includes a value that represents arespective one of the visual word frequencies, wherein the categorizinguses the sample vectors and the cell vectors.